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Abstract In this paper, we show how to efficiently and 
effectively extract a class of "low-rank textures" in a 3D 
scene from 2D images despite significant corruptions 
and warping. The low-rank textures capture geometri- 
cally meaningful structures in an image, which encom- 
pass conventional local features such as edges and cor- 
ners as well as all kinds of regular, symmetric patterns 
ubiquitous in urban environments and man-made ob- 
jects. Our approach to finding these low-rank textures 
leverages the recent breakthroughs in convex optimiza- 
tion that enable robust recovery of a high-dimensional 
low-rank matrix despite gross sparse errors. In the case 
of planar regions with significant affine or projective 
deformation, our method can accurately recover both 
the intrinsic low-rank texture and the precise domain 
transformation, and hence the 3D geometry and ap- 
pearance of the planar regions. Extensive experimental 
results demonstrate that this new technique works ef- 
fectively for many regular and near-regular patterns or 
objects that are approximately low-rank, such as sym- 
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metrical patterns, building facades, printed texts, and 
human faces. 
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1 Introduction 

One of the fundamental problems in computer vision is 
to identify certain feature points or salient regions in 
images. These points and regions are the basic building 
blocks for almost all high-level vision applications such 
as image matching, 3D reconstruction, object recogni- 
tion, and scene understanding. Through the years, a 
large number of methods have been proposed in liter- 
ature for extracting various types of feature points or 
salient regions. The detected points or regions typically 
represent parts of the image that have distinctive ge- 
ometric or statistical properties such as Canny edges 



(Canny, 1986), Harris corners (Harris and Stephens 



1988D , and textons ( [Leung and Malik[ |2QQ1 ). 

One of the important applications of detecting fea- 
ture points or regions in images is to establish point- 
wise correspondences or measure similarity between dif- 
ferent images of the same object. This problem is espe- 
cially challenging if the images are taken from different 
viewpoints under different lighting conditions. Thus, it 
is desirable that the detected points/regions are some- 
what stable or invariant under transformations incurred 
by changes in viewpoint or illumination. In the past two 
decades, numerous "invariant" features and descriptors 
have been proposed, studied, compared, and combined 
in the literature (s ee ([Mikolajczyk and Schmid[ [2005 



Winder and Brownj 2007) and references therein). Some 



of the earliest work in this genre were based on using 
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a Markov model to study dependences between various 



wavelet subbands for rotation invariant textures (Co- 



1996 



Do and Vetterli 



hen et all [T99T| |Ctien and Kundu| [19941 |Wu and Wei 



2002). There has also been a lot 



of study in using different kinds of basis functions, such 
as Gabor wavelets, to filter the image and compute ro- 
tation invariant features from the filtered image (see 



(Haley and Manjunath, 1999 Greenspan et al 1994 



Madiraju and Liu 1994) and references therein). 

A widely used invariant feature descriptor is the 
scale invariant feature transform (SIFT) ( [Lowe 2004), 
which to a large extent is invariant to changes in ro- 
tation and scale {i.e., similarity transformations) and 
illumination. Nevertheless, if the images are shot from 
very different viewpoints, SIFT is not very successful in 
establishing reliable correspondences. This problem has 
been partially addressed by its affine-invariant version 



dMikolajczyk and Schmid| |2004{ [Morel and Yul |2009[ ). 

However, even these extensions of SIFT are limited in 
practice: Although the deformation of a small distant 
patch can be well-approximated by an affine transform, 
projective transformations are necessary to describe the 
deformation of a large region viewed through a per- 
spective camera. There has been relatively limited work 
on projection invariant texture representation (Chang' 



et al[[l987 ; Kondepudy and Healey[ |1994 ). To the best 
of our knowledge, from a practical standpoint, there are 
no feature descriptors that are truly invariant (or even 
approximately so) under projective transformations or 
homographies. In addition, these methods normally do 
not deal with other concurrent nuisance factors such as 
illumination changes or partial occlusions and corrup- 
tions that could severely undermine feature extraction 
from real images. 

Despite tremendous effort in the past few decades to 
search for better and richer classes of invariant features 
in images, there seems to be a fundamental dilemma 
that none of the existing methods have been able to 
resolve: On the one hand, if we consider typical classes 
of transformations incurred on the image domain by 
changing camera viewpoint and on the image intensity 
by changing contrast or illumination, then in a strict 
mathematical sense, invariants of the 2D image are ex- 
tremely sparse and scarce - essentially only the topology 
of the extrema of the image function remains invariant, 
known as attributed Reeb tree (ART) (Sundaramoor-^ 
thi et al 2009). The numerous "invariant" image fea- 



tures proposed in the computer vision literature, in- 
cluding the ones mentioned above, are at best approx- 
imately invariant, and often only to a limited extent. 
On the other hand, a typical 3D scene is rich in regu- 
lar structures that are full of invariants (with respect 
to 3D Euclidean transformations or other well-behaved 



deformation groups). For instance, in an urban envi- 
ronment, the scene is typically filled with man-made 
objects that have parallel edges, right-angled corners, 
regular shapes, symmetric structures, and repeated pat- 
terns (see Figures [l] and |2| . These geometric structures 
are rich in properties that are invariant under all types 
of subgroups of the 3D Euclidean group. As a result, 
their 2D (affine or perspective) images encode very rich 
and precise information about the 3D geometry and 



structure of the objects in the scene (Ma et al, 2004 



Kosecka and Zhang 2005 Schindler et al, 2008). 



In this paper we propose a technique that aims to 
resolve the above dilemma about invariant features. We 
contend that instead of trying to seek local invariant 
features of the image that are either scarce or imprecise, 
we should 

aim to directly extract certain invariant struc- 
tures in 3D through their 2D images by undo- 
ing the (affine or projective) domain transfor- 
mations. 

That is, we cast our quest for "invariance" directly as 
an inverse problem of recovering 3D information from 
2D images. However, to solve such challenging inverse 
problems, we will need some new powerful computa- 
tional tools which we will introduce and develop in this 
paper. 

Many methods have been developed in the past to 
detect and extract all types of regular, symmetric pat- 
terns from images under affine or projective transforms 



(see (Park et al 2008) for a recent evaluation). As sym- 
metry is not a property that depends on a small neigh- 
borhood of a pixel, it can only be detected from a rel- 
atively large region of the image. However, almost all 
existing methods for detecting symmetric regions and 
patterns start by extracting and putting together local 



features such as corners and edges (fYangetal, 2005) 



or more advanced local features such as SIFT points 



(Schindler et al 2008). As feature detection and edge 



extraction themselves are sensitive to local image varia- 
tions such as noise, occlusion, and illumination change, 
such symmetry detection methods inherently lack ro- 
bustness and stability. In addition, as we will see in this 
paper, many regular structures and symmetric patterns 
do not even have distinctive features. Thus, we need a 
more general, effective, and robust way of detecting and 
extracting regular structures in images despite signifi- 
cant distortion and corruption. 

Our goal in this paper is to extract invariant infor- 
mation from regions in a 2D image that correspond to 
a very rich class of regular patterns on a planar sur- 
face in 3D, whose appearance can be modeled (approx- 
imately) as a "low-rank" matrix (see Figure [l] for some 
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(e) Output (r = 14) (f) Output (r = 8) (g) Output (r = 19) (h) Output (r = 6) 



Fig. 1 Low-rank Textures Automatically Rectified by Our Method. From left to right: a butterfly; a face; a tablet of Chinese 
characters; and the Leaning Tower of Pisa. Top: red windows denote the original input, green windows denote the deformed texture 
returned by our method; Bottom: textures in the green window rectifled for display. We notice that the rank of the image matrix, 
denoted by r, is much lower for the rectifled textures. 



examples). In some sense, many conventional features 
mentioned above such as edges, corners, symmetric pat- 
terns can all be considered as special instances of such 
low-rank textures (see Figure |2|. Clearly, an image of 
such a texture may be deformed by the camera projec- 
tion and undergoes certain domain transformation (say 
affine or projective). The transformed texture, viewed 
as a matrix, in general is no longer low-rank in the im- 
age domain. Nevertheless, by utilizing advanced convex 
optimization tools from matrix rank minimization, we 
will show how to simultaneously recover such a low-rank 
texture from its deformed image and the associated de- 
formation. 

Our method directly uses raw pixel values of the im- 
age (window) and there is no need for any pre-extraction 
of low- level, local features such as corners, edges, SIFT, 
and DoG features. The proposed solution and algorithm 
are inherently robust to gross errors caused by corrup- 
tion, occlusion, or cluttered background as long as they 
affect a small fraction of the image pixels. Furthermore, 
our method applies to any image region where there 
are sufficient low-rank textures, regardless of the size 
of their spatial support. Thus, we are able to rectify 
not only small local patches around an edge or a corner 
but also large global symmetric regions such as an en- 
tire facade of a building. We believe that this is a very 
powerful new tool that allows people to accurately ex- 
tract rich structural and geometric information about 
the 3D scene from its 2D images, that are truly invari- 
ant of image domain transformations. 



Organization of this paper: The remainder of this pa- 
per is organized as follows: Section [2] gives a rigorous 
definition of "low-rank textures" as well as formulates 
the mathematical problem associated with extracting 
such textures. Section [3] gives an efficient and effective 
algorithm for solving the problem. We provide exten- 
sive experimental results to verify the efficacy of the 
proposed algorithm as well as the usefulness of the ex- 
tracted low-rank textures in Section 4. In Section 5, we 
discuss some potential extensions and variations to the 
basic formulation. 

2 Transform Invariant Low-rank Textures 

2.1 Definition of Low-rank Textures 

In this paper, we consider a 2D texture as a func- 
tion /^(x,?/), defined on M?. We say that is a low- 
rank texture if the family of one-dimensional functions 
{I^{x^yo) I yo G M} span a finite low-dimensional linear 
subspace i.e., 

r = dim (span{/^(x, yo) \ yo e M}) < k (1) 

for some small positive integer /c. If r is finite, then we 
refer to as a rank-r texture. Figure [2] shows some 
ideal low-rank textures: a vertical or horizontal edge 
(or slope) can be considered as a rank-1 texture; and a 
corner can be considered as a rank-2 texture. To a large 
extent, the notion of low-rank texture unifies many of 
the conventional local features. By this definition, it is 
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easy to see that images of regular symmetric patterns 
always lead to low-rank textures. Thus, the notion of 
low-rank texture encompasses a much broader range of 
"features" or regions than corners and edges. 

Given a low-rank texture, obviously its rank is in- 
variant under any scaling of the function, as well as 
scaling or translation in the x and y coordinates. That 
is, if 

g{x, y) = cl^{ax + ti, 6?/ + ^2) 

for some constants a, 6, c G ]R+,ti,t2 G M, then g{x^y) 
and I^{x^y) have the same rank according to our def- 
inition in ([1]). For most practical purposes, it suffices 
to recover any scaled or translated version of the low- 
rank texture as the remaining ambiguity left 
in the scaling can often be easily resolved in practice by 
imposing additional constraints on the texture (see Sec- 
tion 3.2). Hence, in this paper, unless otherwise stated, 
we view two low-rank textures equivalent if they are 
scaled and translated versions of each other: 

I^{x,y) ^ cl\ax^tuby^t2), 

for all a^b^c G R+, ti, ^2 G M. In homogeneous represen- 
tation, this equivalence group consists of all elements of 
the form: 





■ 


'a ti 






0bt2 






1 



p3x3 



a, be M+,ti,t2 e 



(2) 



In practice, we are never given the 2D texture as 
a continuous function in M?. Typically, we only have 
its values sampled on a finite discrete grid in Z^, of 
size m X n say. In this case, the 2D texture I^{x^y) 
is represented by an m x n real matrix. For a low-rank 
texture, we always assume that the size of the sampling 
grid is significantly larger than the intrinsic rank of the 
texture ie.J^ 

r <C min{m, n}. 

It is easy to show that as long as the sampling rate 
is not one of the aliasing frequencies of the function 
the resulting matrix has the same rank as the con- 
tinuous function Thus, the 2D texture I^{x^y) when 
discretized as a matrix, also denoted by for conve- 
nience, has very low rank relative to its dimensions. 

Remark 1 (Low-rank Textures vs. Random Textures) 
Conventionally, the word "texture" is used to describe 
image regions that exhibit certain spatially stationary 
stochastic properties {e.g., grass, sand, fabrics). Such 
textures can be considered as random samples from a 

^ Notice that the scale of the window needs to be large enough 
to meet this assumption. 

In other words, the resolution of the image cannot be too low. 



stationary stochastic process ( Levina and Bickel 2006 ) 
and generally has full rank when viewed as a matrix. 
The low-rank "textures" defined here are complemen- 
tary to such random textures. Here, low-rank textures 
correspond to regions in an image that have rather de- 
terministic regular or periodic structures. 



2.2 Deformed and Corrupted Low-rank Textures 

In practice, we typically never see a perfectly low-rank 
texture in a real image, largely due to two factors: 1. 
the change in viewpoint induces a transformation on 
the domain of the texture function; 2. the sampled val- 
ues of the texture function are subject to many types 
of corruption such as quantization, noise, occlusions, 
etc. In order to correctly extract the intrinsic low-rank 
textures from such deformed and corrupted image mea- 
surements, we must first carefully model those factors 
and then seek ways to eliminate them. 

Deformed Low-rank Textures. Although many surfaces 
or structures in 3D exhibit low-rank textures, their im- 
ages do not! Suppose that a low-rank texture I^{x^y) 
lies on a planar surface in the scene. The image /(x, y) 
that we observe from a certain viewpoint is a trans- 
formed version of the original low-rank texture function 

I{x, y) = J° o T-\x, y) = J° {t-\x, y)) , 

where r : ^ belongs to a certain Lie group G. 
In this paper, we assume G is either the rotation group 
50(2), or the 2D affine group Aff(2), or the homogra- 
phy group GL(3) acting linearly on the image domainj^ 
In general, the transformed texture /(x, y) as a matrix 
is no longer low-rank. For instance, a horizontal edge 
has rank one, but when rotated by 45°, it becomes a 
full-rank diagonal edge (see Figure [2| a)). 

Corrupted Low-rank Textures. In addition to domain 
transformations, the observed image of the texture might 
be corrupted by noise and occlusions or contain some 
pixels form the surrounding background. We can model 
such deviations as: 

^ By now, one should understand the reason of modeling low- 
rank texture as a function defined on a continuous domain M^: 
we can talk about domain transformation freely. Any image or 
matrix representation of the texture is only a discrete sampling 
of this function. This allows us to generate transformed images 
of a low-rank texture by interpolating values of adjacent pixels. 

^ Nevertheless, in principle, our method works for more general 
classes of domain deformations or camera projection models as 
long as they can be modeled well by a finite-dimensional para- 
metric family. 



5 




(e) Output (r = 1) (f) Output (r = 2) (g) Output (r = 7) (h) Output (r = 14) 

Fig. 2 Representative Examples of Low-rank Textures and Our Results. From left to right: an edge; a corner; a symmetric 
pattern, and a license plate. Top: deformed textures (high-rank as matrices); Bottom: the recovered low-rank representations. 



for some error matrix ^. As a result, the image / might 
no longer be a low-rank texture. In this paper, we as- 
sume that only a small fraction of the image pixels are 
corrupted by large errors, and hence, £^ is a sparse ma- 
trix. 

Our goal in this paper is to recover the exact low- 
rank texture from an image that contains a deformed 
and corrupted version of it. More precisely, we aim to 
solve the following problem: 

Problem 1 (Recovery of Low-rank Texture) Given 
a deformed and corrupted image of a low-rank texture: 
I = {I^ -\- E) o , recover the low-rank texture and 
the domain transformation r G G. 

The above formulation naturally leads to the follow- 
ing optimization problem: 

min rank(/°) +7||^||o s.t. / o r = /V ^, (3) 

where ||£^||o denotes the number of non-zero entries in 
E. That is, we aim to find the texture of the lowest 
possible rank and the error E with the fewest possible 
nonzero entries that agrees with the observation / up to 
a domain transformation r. Here, 7 > is a weighting 
parameter that trades off the rank of the texture versus 
the sparsity of the error. For convenience, we refer to 
the solution found to this problem as a Transform 
Invariant Low-rank Texture (TILT)j^ 

Remark 2 (TILT vs. A ffine- Invariant Features.) TILT 
is fundamentally different from the affine- invariant fea- 



tures or regions proposed in the literature (Mikolajczyk 



and SchmidI ( |2QQ4D ; porel and Yu| ( |2QQ9 i)). Essentially, 
those features are extensions to SIFT features in the 
sense that their locations are very much detected in the 
same way as SIFT. The difference is that around each 
feature, an optimal affine transform is found that in 
some way "normalizes" the local statistics, say by max- 



imizing the isotropy of the brightness pattern ( Carding 



and Lindeberg| ( |1996D ). Here TILT finds the best local 
deformation by minimizing the rank of the brightness 
pattern in a robust way. It works the same way for any 
image region of any size and for both affine and projec- 
tive transforms (or even more general transformation 
groups that have smooth parameterization). More im- 
portantly, as we will see in Section [4j our method is able 
to rectify all kinds of regions that are approximately 
low-rank {e.g. human faces, printed text) and the re- 
sults match very well with human perception. Unlike 
SIFT features whose locations are difficult to predict 
or interpret by human vision, TILT has a nice WYSI- 
WYG property: 

''What You See Is What You Get.'' 

Remark 3 (TILT vs. RASL.) We note that the opti- 
mization problem (|3| is very similar to the robust image 
alignment problem studied in Peng et al (2010a), known 



^ By a slight abuse of terminology, we also refer to the proce- 
dure of solving the optimization problem as TILT. 



as RASL. This is because both RASL and TILT use 
the same mathematical framework (sparse and low-rank 
matrix decomposition with domain transformation) in 
their problem formulation. Although the formulation 
is similar, there are some important conceptual differ- 
ences between the two problems. For instance, RASL 
treats each image as a vector and does not make use of 
any spatial structure within each image, whereas in this 
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paper, TILT uses matrix rank and sparsity to study spa- 
tial structures within a 2D image. In this sense, RASL 
and TILT are highly complementary to each other: they 
try to capture temporal and spatial linear correlation 
in images, respectively. From an algorithmic point of 
view, TILT is simpler than RASL since it deals with 
only one image and one domain transformation whereas 
RASL deals with multiple images and multiple trans- 
formations, one for each image. We will propose many 
extensions to TILT to handle a wider range of textures 
and symmetries, most of which are not applicable to the 
image alignment problem that RASL strives to solve. 
Although beyond the scope of this paper, it remains 
to be seen in the future if one can combine TILT and 
RASL together to develop a richer class of tools for 
extracting more information from images. 

Remark 4 (TILT vs. Transformed PC A.) One might 
argue that the low-rank objective can be directly en- 
forced, as in Transformed Component Analysis (TCA) 



proposed by Frey and Jojic (1999), which uses an EM 



algorithm to compute principal components, subject to 
domain transformations drawn from a known group. 
The TCA deals with Gaussian noise and essentially 
minimizes the 2-norm of the error term E. So the reader 
might wonder if such a "transformed principal compo- 
nent analysis" approach could apply to our image rec- 
tification problem here. Let us ignore gross corruption 
or occlusion for the time being. We could attempt to 
recover a rank-r texture by solving the following opti- 
mization problem: 



min 11/ o r 



1^ 



s.t. rank(J°) < 



(4) 



One can solve Q by minimizing against the low-rank 
component and the deformation r iteratively: with 
f fixed, estimate the rank-r component via PGA, 
and with fixed, solve the deformation f in a greedy 
fashion to minimize the least-squares objective]^ 

Figure [3] shows some representative results of using 
such a "Transformed PGA" approach. However, even 
for simple patterns like the checker-board, it works only 
with a correct initial guess of the rank r = 2 beforehand. 
If we assume a wrong rank, say r = 1 or 3, solving Q 
would not converge to a correct solution, even with a 
small initial deformation. For complex textures like a 
building facade shown in Figure |3j whose rank is impos- 
sible to guess in advance, we have to try all possibilities. 
Moreover, Q can only handle small Gaussian noise. For 
images taken in real world, partial occlusion and other 

^ In fact, this simple iteration closely emulates the expectation- 
maximization (EM) procedure for solving the TCA problem pro- 
posed by Frey and Jojic ( 1999| >. 




(a) r = 1, fail (b) r = 2, succeed (c) r = 3, fail 




Fig. 3 Transformed PCA: Recovery of low-rank textures via 
solving (^. For a checker-board pattern if and only if we give the 
correct rank, r = 2, can we get correctly rectified textures. On a 
building facade, we try 6 different initial guesses of the rank from 
1 to 6 and only rank r = 5 works approximately well. 



types of corruption are often present. The naive trans- 
formed PGA does not work robustly for such images. 
As we will see in the rest of this paper, the TILT algo- 
rithm that we propose next can automatically find the 
minimal matrix rank in an efficient manner and han- 
dle very large deformations and non-Gaussian errors of 
large magnitude. 



3 Solution by Iterative Convex Optimization 



As proposed in (Peng et al 2010a), although the rank 
function and the -^^-norm in the original problem ([3| are 
extremely difficult to optimize (in general NP-hard), 
recent breakthroughs in sparse representation and low- 
rank matrix recovery have shown that under fairly broad 
conditions, they can be replaced by their convex surro- 



gates ( Gandes et al , 20091 Ghandrasekaran et all 12009 ) : 
nt^lll/' 



the matrix nuclear nor 



for rank(/^) and the 



^ The nuclear norm of a matrix is the sum of all its singular 
values. 
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Algorithm 1 (The TILT Algorithm) 

INPUT: Input image I eR"^^^, initial transformation r G ( 
(affine or projective), and a weight A > 0. 
WHILE not converged DO 

Step 1: Normalization and compute Jacobian: 



I o r ■ 



lor 



d_ 



vec(/ o 
|vec(/oC)||F/ lC=r 



||/ o t\\f 

Step 2 (inner loop): Solve the linearized problem: 

(/0*,£;*,Z\r*) ^ argmin^o,^,^, |U + A||£;||i 
s.t. / o r '+ V/Z\r = + E; 

Step 3: Update the transformation: r ^ r + Z\r*; 
END WHILE 

OUTPUT: Optimal solution 7°*, r* to problem (|5|. 

^i_norn|^ ||£^||i for ||£^||o, respectively. Thus, we end up 
with the fohowing optimization problem: 

min ||/0||* + A||^||i s.t. /or = /V^. (5) 

We note that although the objective function in the 
above problem is convex, the constraint /or = + is 
nonlinear in r G G, and hence the problem is not con- 
vex. A common technique to overcome this difficulty is 
to linearize the constraint ("Baker and Matthews' 12004 



Peng et al 2010a) around the current estimate and it- 
erate. Thus, the constraint for the linearized version of 
our problem becomes 



/ o r + VI At = I^^E, 



(6) 



where V/ is the Jacobian: derivatives of the image w.r.t 
the transformation parameters0The optimization prob- 
lem in ([5| reduces to 

,+A||£;||i s.t. /or+V/zAr = 7^+/;. (7) 



min 11/^ 



The linearized problem above is a convex program and 
is amenable to efficient solution. Since the linearization 
is only a local approximation to the original nonlinear 
problem, we solve it iteratively in order to converge to 
a (local) minimum of the original non-convex problem 
([5|. The algorithm has been summarized as Algorithm 

□ 

The iterative linearization scheme outlined above is 
a common technique in optimization to solve nonlinear 
problems. It can be shown that this kind of iterative lin- 
earization converges quadratically to a local minimum 

^ The ^-"^-norm of a matrix is the sum of the absolute values of 
its entries. 

^ Strictly speaking, V/ is a 3D tensor: it gives a vector of 
derivatives at each pixel whose length is the number of parameters 
in the transformation r. When we "multiply" V/ with another 
matrix or vector, it contracts in the obvious way which should be 
clear from the context. 



of the original non-linear problem. A complete proof is 
out of the scope of this paper. We refer the interested 
reader to Peng et al (2010b); Cromme (1978); Jittorn- 



trum and Osborne (1980) and the references therein. 



3.1 Fast Algorithm Based on Augmented Lagrange 
Multiplier Methods 

The most computationally expensive part of Algorithm 
[l]is solving the convex program in the inner loop (Step 
2) of Algorithm [l] This can be cast as a semidefinite 
program and can be solved using conventional algo- 
rithms such as interior-point methods. While interior- 
point methods have excellent convergence properties, 
they do not scale very well with problem size and hence, 
unsuitable for real applications involving large images. 
Fortunately, there has been a recent flurry of work in de- 
veloping fast, scalable algorithms for nuclear norm min- 



imization ( 


Cai et al| 


2010 


Toh and Yun 


2010 


Ganesh 


et al, 


2009 


Lin et al| 


2009"). To solve the linearized prob- 



lem in (m), we use the Augmented Lagrange Multiplier 



(ALM) method ( |Bertsekas[ |2004| |Lin et al[ |2009| ). For 

the sake of completeness, in this section we explain how 
the ALM method can be adapted to solve our problem, 
and also comment on some implementation details for 
improving stability and range of convergence. 

3.1.1 General Formulation of ALM 

We ffist review the ALM algorithm in a more general 
setting, rather than for our specific problem. This will 
be useful later when we deal with different variations 
of the TILT algorithm that can all be solved under the 
same algorithmic framework described here. 

Let us consider convex optimization problems of the 
form: 



min f{X) s.t. A{X) 



b, 



(8) 



where / is a convex (not necessarily smooth) function, 
^ is a linear function, and 6 is a vector of appropri- 
ate dimension. The basic idea of Lagrangian methods 
is to convert the above constrained optimization prob- 
lem into an unconstrained problem that has the same 
optimal solution. 

For the above problem ([8|, we define the augmented 
Lagrangian function as follows: 



f{X) + {Y,b- A{X)) + t^\\b- A{X) \\l (9) 



where F is a Lagrange multiplier vector of appropriate 
dimension, || • ||2 denotes the Euclidean norm, and /i > 
denotes the penalty imposed upon infeasible points. 
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The following result from Bertsekas ( |2QQ4| ) establishes 
an important relation between the original problem ([8| 
and its augmented Lagrangian function (|9|. 

Theorem 1 (Optimality of ALM) Suppose that X 
is the optimal solution to ([8|. Then, for appropriate 
choice of Y and sufficiently large ji, we have 



X 



argniin C^{X, Y). 



Thus, we could solve an unconstrained convex mini- 
mization problem in order to obtain the solution to the 
convex program ([8|. This result, while of theoretical in- 
terest, is not directly useful in practice since the choice 
of Y and fi is not known a priori. 

ALM methods are a class of algorithms that simul- 
taneously minimize the augmented Lagrangian function 
and compute an appropriate Lagrange multiplier. The 



basic ALM iteration proposed in Bertsekas (2004) is 
given by 



argminx C^^{X,Yk), 



A{Xk)), 



(10) 



where {pk} is a monotonically increasing positive se- 
quence {p > 1). Thus, we have reduced the original op- 
timization problem ([8| to a sequence of unconstrained 
convex programs. 

The above iteration is computationally useful only 
if >C^(A, Y) is easy to minimize with respect to X. For 
the problems encountered in this paper, this turns out 
to be the case indeed. This can be attributed to the 
following key property of the matrix nuclear norm and 
1-norm: 



WX-Yn 



S^{Y^ + Y2)= arg min^ p\\X\\, - (A, Y,)- 
US^iUjV^ =^Tgmmx p\\X\U-{X,Wi)^'^\\X-W2\\l^ 

(11) 

where U UV^ is the Singular Value Decomposition (SVD) 
of (1^1 + and fi is any non-negative real constant. 
Here, *S[-] represents the soft-thresholding or shrinkage 
operator which is defined on scalar s as follows: 



: sign(x) • {\x\ - 11) , 



(12) 



where /i > 0. The shrinkage operator is extended to 
vectors and matrices by applying it elementwise. We 
now discuss how this iterative scheme can be applied 
to our linearized convex program 



3.1.2 Solving TILT by Alternating Direction Method 

For the problem given in ^ , the augmented Lagrangian 
is defined as: 



L^{I\ E, At, Y) = f{I°, E) + {Y, R{I°, E, At)) 
+ '^\\R{I\E,At)\\1, 



(13) 



where /i > 0, F is a Lagrange multiplier matrix, (•, •) 
denotes the matrix inner product, and 

fil'^E) =n^X\\E\\,, 
R{I^, E,At)=Iot^ VI At - I^ - E. 

From the above discussion, the basic ALM iteration 
scheme for our problem is given by 

{I^,Ek,Ark) = argmin/o^£;^^^L^^(/0,£;,zAr, Yfc_i), 
Yk = Yk-i^Pk-iR{Ik.Ek,Ark). 

Throughout the rest of the paper, we will always assume 
that /ifc = Po foi" some po > and p > 1, unless 
otherwise specified. 

We now focus on efficiently solving the first step 
of the above iterative scheme. In general, it is compu- 
tationally expensive to minimize over all the variables 

E and At simultaneously. So, we adopt a common 
strategy to solve it approximately by adopting an al- 
ternating minimizing strategy i.e., minimizing with re- 
spect to £^ and Ar one at a time: 



arg min/o L^^ (/^ , Ek , Ark , ^fe ) , 
Ek^i = arg miiiE L^^{I^^^, E, Ark, Yk), 
Z\r/e+i = argmin/^^L^^(/^^-L,£;/e+i,Z\r, Yfc). 



(14) 



Due to the special structure of our problem, each of 
the above optimization problems has a simple closed- 
form solution, and hence, can be solved in a single step. 



More precisely, the solutions to (14) can be expressed 



explicitly using the shrinkage operator as follows: 



4Vi ^ UkS^-^[Uk]v,^, 

Ek+i ^ 5;,^-i[/or + V/Z\r,-4V+/i-ir,], 
An+i ^ (V/)t(-/or + /0+i+^fe+i-/i-^n), 



(15) 

where Uk^kV^ is the SVD of (/ o r + VIAtj, - Ej, ^ 
/i^^Yfc), and (V/)''' denotes the Moore-Penrose pseudo- 
inverse of V/. 

From experiments, we observe that the above al- 
gorithm is much faster than all other alternative con- 
vex optimization schemes (such as the interior point 
method, accelerated proximal gradient, etc.). Although 



the convergence of the ALM method (10) has been well 



established in the optimization literature, its approxi- 
mation by the above alternating minimization, known 
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Algorithm 2 (Solving Inner Loop of TILT) 

INPUT: The current (deformed and normalized) image /or G 
j^mxn -^g Jacobian V/ against deformation r, and A > 0. 
Initialization: k = 0,Yo = 0, Eq = 0, Atq = 0, /xq > 0, p > 1; 
WHILE not converged DO 

(Uk, Uk,Vk) = svd(/ o r + VI Ark - ^fc + Mfc ^n.); 

^Tfc+i = (V/)t (-/ o r + /o^, + Ek+i - iil^Yk); 
END WHILE 

OUTPUT: solution (7°, At) to problem ([t]). 



as alternating direction method (ADM) of multipliers, is 
not always guaranteed to converge to the optimal solu- 
tion. If there are only two alternating terms, its conver- 



gence has been well-studied and established (jGlowinski 


and Marroco , 


1975 IGabay and Mercier, 1976 Eckstein 


and Bertsekas 


1992|). Somewhat surprisingly, however. 



very little is proven for the convergence of cases where 
there are more than three alternating terms, despite 
overwhelming empirical success with such schemes. Re- 



cently, Yuan and Tao (2010) obtained a convergence 



result for a certain three-term alternation applied to 
the noisy principal component pursuit problem (see also 



(He|^2009)). However, the scheme proposed and proved 
in (Yuan and Tao, 2010) is slightly different from the 



direct ADM scheme ( 14) and is much slower in practice. 



The convergence of the ADM scheme (14) remains an 
open problem although in practice it gives the simplest 
and fastest algorithm. 

We summarize the ADM scheme for solving ([t]) as 
Algorithm [2] We choose the sequence {/ife} to satisfy 
A^fc+i = PM/c for some p > 1. We note that the opera- 
tions in each step of the algorithm are very simple with 
the SVD computation being the most computationally 
expensive stepFj 



3.2 Implementation Details 

In the previous section, we described how the linearized 
and convexified TILT problem ^ can be solved effi- 
ciently using the ALM algorithm. However, there are a 
few caveats in applying it to real images. In this sec- 
tion, we discuss some possible ways to deal with these 
issues and make the problem well-defined. We also dis- 
cuss some specific implementation details that could 
potentially improve the range of convergence of our al- 
gorithm. 

Empirically, we notice that for larger window sizes (over 100 x 
100 pixels), it is much faster to run the partial SVD instead of 
the full SVD, if the rank is known to be very low. 



Constraints on the Transformations. As discussed in 
Section [2j there are certain ambiguities in the defini- 
tion of low-rank texture. The rank of a low-rank tex- 
ture function is invariant with respect to scaling in the 
pixel values, scaling in each of the coordinate axes, and 
translation along any direction. Thus, in order for the 
problem to have a unique, well-defined optimal solution, 
we need to eliminate these ambiguities. In Step 1 of Al- 
gorithm [1] the intensity of the image is renormalized 
in each iteration in order to eliminate the ambiguity of 
scale in the pixel values. Otherwise, the algorithm may 
tend to converge to a "globally optimal" solution by 
zooming into a black pixel or dark region of the image. 

To deal with the ambiguities in the domain trans- 
formation, we could add some additional constraints 
to the problem. Let r(-) represent the transformation. 
Suppose that the support of the initial image window O 
is a rectangle (call the edges ei and 62) with the length 
of the two edges being L{ei) = a and L{e2) = 6, so that 
the total area S{f2) = ah. 

For affine transformations, to eliminate the ambigu- 
ity in translation, we typically enforce that the center 
xq of the initial rectangular region Q remain fixed be- 
fore and after the transformation i.e., t{xq) = xq. This 
imposes a set of linear constraints on At of the form: 



At At = 0. 



(16) 



To eliminate the ambiguities in scaling the coordinates, 
we enforce (only for affine transformations) that the 
area and the ratio of edge length remain constant be- 
fore and after the transformation, i.e. S{t{0)) = S{f2) 
and L(r(ei))/L(r(e2)) = L(ei)/L(e2). In general, these 
equalities impose additional nonlinear constraints on 
the desired transformation r in problem ([5|. Similar 
to the way we dealt with the non-linearity in the con- 
straint in (|5|, we can linearize these additional con- 
straints w.r.t. the transformation parameters r and ob- 
tain another set of linear constraints on At denoted by: 



A,At 0. 



(17) 



We have given a more detailed explanation and deriva- 
tion of these two sets of linear constraints in the ap- 
pendix. 

For projective transformations, we typically fix two 
points{^the two diagonal corners of the initial rectan- 
gular window or of the parallelogram if initialized with 

In fact, one can use the same set of constraints as the affine 
case. But from our experience, the algorithm is more stable with 
the initialization of two points. In addition, as we will explain, 
the parameterization is more geometrically meaningful. 
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the result of the affine TILT{^ Notice that a homog- 
raphy matrix has a total of eight degrees of freedom. 
If the low-rank texture is associated with certain sym- 
metric pattern that has two sets of parallel lines, the x 
and y-axes of the rectified low-rank texture then corre- 
spond to the two vanishing points. The two vanishing 
points and the two fixed points together impose exactly 
eight constraints and uniquely determine the homog- 
raphy. Hence, with this parameterization, there is no 
ambiguity in the optimal solution. 

Thus, to eliminate the scaling and translation am- 
biguities in the solution, we simply add a set of linear 
constraints to the optimization problem The result- 
ing convex program can be solved again using the ALM 
algorithm. This would involve making very small mod- 
ifications to Algorithm [2] to incorporate the additional 
linear constraints [HI 

Multi- Resolution Approach. While the above formula- 
tion works reasonably well in practice, the presence of 
arbitrarily shaped sharp features or contours on an oth- 
erwise smooth low-rank texture can cause the TILT 
algorithm to converge to a local minima that is not 
the desired solution. Hence, to cope with large defor- 
mations, we adopt a multi-resolution approach. This 
is a common technique in many computer vision algo- 
rithms wherein we construct a pyramid of images, start- 
ing from the input image, by subsequently blurring and 
downsampling it. The problem is then solved at the low- 
est resolution first. The solution thus obtained is used 
to initialize the algorithm at the adjacent level of higher 
resolution, and this procedure is repeated for all levels. 
In practice, the multi-resolution approach not only im- 
proves the range of transformations that our algorithm 
can handle, but it also improves the running time of 
the algorithm significantly. This is because, the convex 
programs can be solved much faster at the lower reso- 
lutions, and since the initialization at the higher resolu- 
tion is better, the number of iterations to convergence 
is typically very small (less than 20). 

An important consideration while incorporating the 
multi-resolution approach for the TILT algorithm is the 
fact that the convex relaxation discussed in Section [3] is 
tight only at higher dimensions!^ Although it is very 
difficult to analytically estimate the minimum optimal 
size of the image, in practice, we find that our method 

^'^ In practice, we almost always initialize the projective case 
with the result from the affine case. 

We only have to introduce an additional set of Lagrangian 
multipliers and then revise accordingly the update equation as- 
sociated with ATk-\-i- 

The convex relaxation has a failure probability associated 
with it which typically decays as 0(?2~^), for some o; > 0, as- 
suming that the matrices involved have size n x n. 



works well for windows of size larger than 20 x 20 pix- 
els. In our implementation, we use a Gaussian kernel to 
blur the image and consider up to two levels of down- 
sampling, each by a factor of 2 w.r.t. its adjacent higher 
level of resolution. We also ensure that the size of the 
image at the lowest resolution is at least 20 x 20 pixels. 
We tested the speed of this scheme in Matlab on a 3Ghz 
PC. Fixing the initial window to have size 50 x 50, the 
running time is less than 6 seconds, averaged over 100 
trials. 

Branch- and- Bound Scheme. We can increase the range 
of deformation that our algorithm can handle signifi- 
cantly by employing a branch-and-bound scheme. For 
instance, in the affine case, we initialize Algorithm [l] 
with different deformations (e.g., a combination search 
for all 4 degrees of freedom for affine transforms with 
no translation). Any affine transformation can be pa- 
rameterized by [A 6] G M^^^ X R^. Since we fix the 
center of the window, we effectively set 6 = 0. The re- 
maining 4 parameters of the transformation denote the 
scaling along the x and y-axes, rotation, and skew. As 
discussed in Section |2j the scaling along the canoni- 
cal axes does not change the rank of the texture, and 
hence, we ignore the ambiguity in it. Thus, we are left 
with two parameters - skew and rotation - that need to 
be determined. In other words, we can parameterized 
the affine matrix A as: 



A{e,t) = 



COS 6 


— sinO 




"1 1 






X 


1 


sinO 


cosO 





We partition the parameter space (rotation and skew) 
into multiple regions and perform a greedy search on 
the regions one-by-one. We first run TILT for various 
initializations of the rotation angle. We choose the one 
that minimizes the cost function, and use this as an ini- 
tialization to search for the skew parameters along the 
x-direction first, and subsequently along the ^/-direction. 
The parameters that minimize the cost function is the 
output of the branch-and-bound scheme. 

A natural concern about such a branch-and-bound 
scheme is its effect on speed. Within the multi-resolution 
scheme, we only need to perform branch-and-bound at 
the level of lowest resolution, find the best solution, and 
use it to initialize the higher-resolution levels. Since Al- 
gorithm [1] is extremely fast for small matrices at the 
lowest-resolution level, running multiple instances with 
different initializations does not significantly affect the 
overall speed. In a similar spirit, to find the optimal 
projective transform (homography), we always find the 
optimal affine transform first and then use it to ini- 
tialize the algorithm From our experience, we found 

Notice that, for a perspective image of a plane, the affine 
model is approximately true if the size of the patch is small com- 
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that with such initiahzation, we normally do not have 
to use the branch- and-bound scheme for the projective 
transformation case. 



4 Experimental Results 

In this section, we present the results of the proposed 
TILT algorithm on various natural and artificial low- 
rank textures. We first present some results quantify- 
ing the performance range of our algorithm. We then 
present examples from many different categories of nat- 
ural images where TILT can recover the inherent sym- 
metrical texture in the images. Finally, we present some 
examples where TILT does not recover the low-rank 
texture and examine the reasons for such failures. 



4.1 Range of Convergence of TILT 

For most low-rank textures, the proposed Algorithm [l] 
has a fairly large range of convergence, without using 
any branch- and-bound. In this section, we give a careful 
characterization of the range of convergence (ROC) of 
the proposed algorithm on a standard checker-board 
pattern. 

Affine Case. We deform a checker-board like pattern 
by a wide range of affine transforms of the form: y = 
Ax -\- b^x^y G M^, and test if the algorithm converges 
back to the correct solution. We parameterize the affine 
matrix A as A{0,t) = [^^^"^ -^^"/] x [J J] . We change 
{O^t) within the range G [0,7r/6] with step size 7r/60, 
and t e [0, 1] with step size 0.05. We repeat the sim- 
ulations 10 times in each region and compute the suc- 
cess rate. Figure |4]^b) shows the rate of success for all 
regions. Notice that the algorithm always finds the cor- 
rect solution for up to ^ = 20° of rotation and skew 
(or warp) of up to t = 0.4. We note that, due to its 
rich symmetries and sharp edges, the checker-board like 
pattern is a challenging case for "global" convergence 
as at many angles, its image corresponds to a local 
minimum that has relatively low rank. In practice, we 
find that for most symmetric patterns in urban scenes 
(as shown in Figure [8|, our algorithm converges for a 
much larger range without any branch-and-bound. So 
the large ROC ensures that a simple partition of the 
parameter space with a branch-and-bound scheme can 
make the TILT algorithm work for the entire range of 
affine transformations. 



Projective Case. For the case of projective transforma- 
tions (homographies), even if we fix two points, there 
are still four remaining degrees of freedom. It is difficult 
to illustrate the range of convergence for all four dimen- 
sions together. So here we test the range of convergence 
for some of the most representative projective transfor- 
mations that we normally encounter in real-world im- 
ages: a planar low-rank pattern rotating in front of a 
perspective camera. 

More specifically, we put a standard checker-board 
pattern in front of a standard perspective camera - the 
image plane is the xy-pldine and the optical axis is the 
z-axis. We rotate the pattern along a line through the 
origin within the xy-plane. We indicate the location of 
the axis of rotation by the angle it makes with the x- 
axis. Experimentally, we find the limits of the TILT 
algorithm by gradually increasing the amount of rota- 
tion along each axis (from 0° to 90° at a step of 5°). We 
also change the rotation axis from the x-direction (0°) 
to the ^/-direction (90° ) Figure [s] shows the range of 
convergence of TILT under this setting. The curves in- 
dicate when TILT fails for the first time, or in other 
words, TILT succeeds for all cases below the curves. 

The two curves in the plot compare two cases: the 
first case (green curve) is just the basic projective TILT 
without any special initialization nor any branch and 
bound; the second case (red curve) is the projective 
TILT initialized with the results from the affine TILT. 
From these results, we may conclude: 

— The basic projective TILT works extremely well for 
the slanted checker-board like pattern - it converges 
up to 50° of rotation in all directions. 

— Initialization with the affine TILT normally boosts 
the range of convergence for the projective TILT up 
to 65° or rotation, in some cases increasing by as 
much as 20°. 

There are many possible ways to further improve the 
range of convergence for the TILT algorithm. So far, we 
have always used a square window as the initial window. 
As we will see with experiments in later sections, TILT 
could work much better if the initial window is chosen 
in a way that is more adaptive to the orientation of the 
texture as well as the scale of the texture. 

4.2 Robustness of TILT 

In this experiment, we test the robustness of TILT 
on some representative synthetic and realistic low-rank 
patterns, shown in Figure Ig] (left). We introduce a small 



pared to the distance. The projective model however apphes re- 
gardless of the size. 



The setting is symmetric and the pattern is symmetric so we 
only have to verify the range of convergence for the first quadrant. 
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Convergence Range of TILT under AFFINE mode 




e 

(a) Representative Input Images in Each Region (b) Convergence Probability Map 

Fig. 4 Range of Convergence for Affine Transform without branch-and-bound. Image on the left: initial input images that 
correspond to different regions of the range of parameter space in the plot on the right. Plot on the right: x-axis: rotation angle 0] 
y-axis: skew parameter t. White region indicates success in all trials while the black region indicates failure in all trials. 




Fig. 5 Range of Convergence for Projective Transform. Image on the left: Representative initial input images for which the 
TILT algorithm succeeds without any special initialization or branch and bound. Plot on the right: x-axis: the position of the rotation 
axis; y-axis: the amount of rotation. Green curve: without initialization with afiine TILT. Red curve: initialized with affine TILT. 



deformation to each texture (say rotation by 10°) and 
examine if TILT converges to the correct solution un- 
der different levels of random corruption. We randomly 
select a fraction (from 0% to 100%) of the pixels and as- 
sign them a random value in the range (0, 255). We run 
the TILT algorithm on such corrupted images and ex- 
amine how many images are correctly rectified by TILT 
at each level of corruption. The results are shown in 
Figure [g] (right). Notice that for almost three quarters 
of the textures TILT can tolerate up to 30% random 
corruption and for textures in the first row TILT can 



rectify the deformation even if more than 50% of the 
pixels are corrupted. Also, we notice that for textures 
where TILT has low error tolerance, their textures ei- 
ther have very low contrast, or are rather sparse, or 
have relatively high rank. 

Figure [7| show some more examples for the robust- 
ness of the proposed algorithm to random corruption, 
occlusions, and cluttered background, respectively. For 
the first two experiments, no branch-and-bound is even 
needed. 
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Robustness ofTILTw.r.t. random corruption 




0.4 0.5 0.6 
Corruption Rate 



Fig. 6 Robustness Tests of TILT on various low-rank textures. The textures on the left are ordered in descent order of being 
robust to random corruption: from left to right, from top to bottom. Plot on the right shows TILT succeeds with how many textures 
at each level of corruption. 
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(i) Input / 



(k) Low rank 7° 



(1) Sparse error E 



Fig. 7 Robustness of TILT. Top row: random corruption added to 60% pixels; Middle row: scratches added on a symmetric pattern; 
Bottom row: containing cluttered background. 



The above experiments demonstrate the robustness 
of TILT to randomly located corruptions. However, in 
some cases we may have some idea about which part of 
the images are likely to be corrupted or occluded. For 
instance, if the initial window is too close to the im- 
age boundary, the algorithm may converge to a region 
outside of the image boundary. In such cases, we know 
which pixels in the region are missing. This informa- 



tion can help us to modify the algorithm and further 
improve its robustness. We will discuss this case in more 
detail in Section [5] when we study possible extensions 
to TILT. 
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4.3 Shape from Low-rank Textures 



4.4 Rectifying Many Categories of Low-rank Textures 



Obviously, the rectified low-rank textures found by our 
algorithm can facilitate many vision tasks, including es- 
tablishing correspondences among images, recognizing 
text and objects, or reconstructing the 3D structure of a 
scene, etc. Due to limited space, we only illustrate how 
our algorithm can help extract precise, rich geometric 
and structural information from an image of an urban 
scene, as shown in Figure [s] (top). This complements 
many existing "Shape from X" methods in the vision 
literature. 

The size of the image shown in Figure[8]is 1024 x 685 
pixels and we simply run the TILT algorithm (with 
affine transforms) on a grid of 60 x 60 windows. If the 
rank of the resulting texture drops significantly from 
that of the original window, we say that the algorithm 
has "detected" a region with some low-rank structure 
In Figure [Sj we have shown the resulting deformed win- 
dows, together with the local orientation and surface 
normal recovered from the recovered affine transforma- 
tion. Notice that for windows located inside the build- 
ing facades, TILT correctly recovers the local geome- 
try for almost all of them; even for patches located at 
the edge of the facades, one of the sides of the rectified 
patches always aligns precisely with the building's edge. 

Of course, one can initialize the size of the windows 
at different sizes or scales. But for larger regions, affine 
transforms will not be accurate enough to describe the 
deformation caused by a perspective projection. For in- 
stance, the entire facade of the middle building in Fig- 
ure [8] (middle row) obviously exhibits signiffcant projec- 
tive deformation. Nevertheless, if we initialize the pro- 
jective TILT algorithm with the output from the affine 
TILT algorithm on a small patch on the facade, the al- 
gorithm can easily converge to the correct homography 
and recover the low-rank textures correctly, as shown 
in Figure [s] (middle row). 

With both the low-rank texture and their geometry 
correctly recovered, we can easily perform many inter- 
esting tasks such as editing parts of the images while 
respecting the true 3D shape and the correct perspec- 
tive. Figure [8] (middle row) shows some examples, which 
suggest that our method can be very useful for many 
augmented-reality related applications. 



The image rank is computed by thresholding the singular 
values at l/30th of the largest one. Also, we throw away regions 
whose largest singular value is too small, which typically corre- 
spond to a smooth region like the sky. 



In this section, we test the efficacy of the TILT al- 
gorithm on natural images belonging to various cate- 
gories. Besides some examples where TILT works very 
well, we also present some cases that are particularly 
challenging where our algorithm succeeds only to some 
extent, and some examples where it fails. We believe 
that from these examples, the readers may gain a better 
understanding about both the strength and limitations 
of the TILT algorithm. 

Since the proposed TILT algorithm has a decent 
range of convergence for both affine and projective de- 
formations and it is also very robust to sparse corrup- 
tion of the image intensity, we ffnd that it works re- 
markably well for a very broad range of patterns, reg- 
ular structures, natural objects and even printed text 
with an approximate low-rank structure. Figure [9] shows 
many such examples, from which we see that even when 
initialized with a very rough rectangular window, our 
algorithm can converge precisely to the underlying low- 
rank structure of the images, despite occlusion, noisy 
background, illumination change, and significant defor- 
mation. 



Issues with more challenging cases. One should expect 
our algorithm to work well only when the low-rank and 
sparse structure assumptions, explained in Section |2j 
hold true. The current algorithm is only a basic version 
and its capability is still limited, especially when we try 
to apply it to cases where the assumptions are not fully 
met. Through the remainder of this section and the next 
section, we will discuss some of the limitations of TILT, 
as well as potential extensions that make it work better 
in some of the more challenging cases. Figure [lO] shows 
some examples on which TILT does not perform as well 
as it did in previous examples. These examples are ar- 
guably more challenging than those shown in Figure 



Figure 10 'a) is an example when the size of the in- 



put window is too large. Ideally, the correct solu- 
tion is supposed to converge to a region beyond the 
image boundary. It stops once it hits the bound- 
ary which is a only partially correct solution. In the 
next section, we will show how this problem can be 
addressed by combining the basic TILT algorithm 
with techniques from low-rank matrix completion. 
Figure [To|b) shows a case where the algorithm man- 
ages to converge to an approximate solution despite 
the fact that there is a lack of regularity in the 
printed text. TILT managed to correct the perspec- 
tive distortion partially in this case. 
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Fig. 8 Shape from (Low-rank) Textures. Top left: The input grid of 60 x 60 windows. Top right: Low-rank textures detected by 
the TILT algorithm with affine transform and the recovered local affine geometry. Middle left: Use homography to get the projective 
transformations. Middle right: the resulting image with the marked regions augmented with virtual objects. Bottom row: representative 
low-rank textures recovered from the marked regions of the buildings. 



— Figure 10 ^c) shows a case where the algorithm man- 
ages to correct the overah pose of the object despite 
fact that the object is not planar, similar to some of 
the cases shown earlier in Figure [T] 

— Figure [lOj^g) shows a failed case, where the per- 
spective deformation is too large for the given in- 
put window and the texture is complex (the rank 
is relatively high). Nevertheless, with slightly bet- 
ter initialization!^ we expect the TILT algorithm 
to converge to the correct solution. For example, as 
shown in Figure pT] (top), if we simply shorten the 
width of the initial window along the main tilted 

say by aggregating TILT results from smaller affine patches 
or using rough manual inputs 



direction, the algorithm manages to find the correct 
solution. 

Figure [lOj^h) shows another failed case, where the 
initial window contains too much of the background, 
which has the appearance of a random texture with 
little structure, the algorithm converges to a local 
minimum. Nevertheless, with a slightly different ini- 
tial window that contains less background, the algo- 
rithm converges to the correct solution (see Figure 
(bottom)). 



11 



Figure [TQ[i) shows a case where the low-rank tex- 
ture itself is close to a sparse binary image. The 
algorithm only manages to converge to a partially 
correct transform - the recovered texture is approx- 
imately symmetric along the horizontal direction. In 
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Fig. 9 Representative Results of TILT. The objects can be categorized as follows. Top two rows: regular patterns and textures; 
Middle two rows: signs, characters, and printed text; Bottom two rows: bar code, objects with bilateral symmetry. In each case, the 
red window denotes the input and the green window denotes the final output. The image enclosed by the green window is rectified 
and displayed to emphasize the low-rank structure. 



this case, in order to improve the results, we may 
have to adjust the weights between the low-rank and 
sparse components in the cost function in (jTl), or to 



enforce the symmetry of the desired solution explic- 
itly in the form of additional constraints. 
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(a) boundary effect 




(b) lack of regularity 





(c) non-planar 







(g) large defomration 



(h) too much background 



(i) sparse regular structures 



'STOP 



Fig. 10 Challenging Cases. TILT converges to an approximately correct solution at best for these examples. Top: from left to 
right: boundary problem, not enough regular texture, non-planar objects. Bottom: from left to right: large perspective distortion; too 
much random texture in background; sparse (binary image) low-rank structure. 





Expected failures. It should come as no surprise that 
when the assumptions of TILT are violated, it no longer 



finds the low-rank structure correctly. Figure 12 shows 
the results of TILT on some examples: 



The first example (Figure |l2[a)) shows the limita- 
tions of the "low-rank" assumption on some man- 
made structures: Two incompatible dominant low- 
rank structures (the facade and the shadow) are 
overlapped, which result in an overall high-rank re- 
gion. TILT actually aligns to the orientation of the 
strong shadow. In order to make this case work, a 
simple "low-rank" promoting objective, like the one 
in TILT, is no longer sufficient. 



The second example (Figure 12 'b)) shows another 
limitation of the low-rank assumption. If the cho- 
sen window contains two adjacent low-rank regions 
each of which is distorted differently, the combined 
region might no longer be low-rank when subject 
to one global affine or projective transformation. 
Proper segmentation of the different low-rank re- 
gions is needed before TILT can work correctly on 
each of the low-rank regions; or TILT has to be ex- 
tended to simultaneously handle multiple domain 
transformations . 

Although TILT is designed to be robust to cor- 
ruptions or occlusions, it is effective only when the 
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Fig. 11 Effect of Initialization. For the examples in Figure 
|10[ g) and (h) where TILT had failed earlier, the correct solution 
is found with a slightly different initialization, in both cases by 
reducing the horizontal width of the initial (red) window. 



amount of corruption is not too large. As shown in 
Figure p^c), if there is too much occlusion, TILT 
cannot be expected to succeed, even though hu- 
man vision is still capable of perceiving the build- 
ing structures behind the tree. It remains to be seen 
whether the robustness of TILT can be improved to 
handle such challenging cases. 

As mentioned earlier in Section |2j TILT is not de- 
signed to work on random textures, such as the one 



shown in Figure 12 ^d). Although there has been 
work in the literature showing that it is possible 
to infer approximate orientation of the flower bed 
based on statistical property of the random texture, 
TILT is certainly not designed to handle such cases 
- it is effective only for regular symmetric textures, 
but not for random textures. 



5 Potential Modifications and Extensions 

The TILT algorithm proposed in this paper is still rather 
rudimentary. Nevertheless, due to its simplicity, it can 
be easily modified or extended to handle more complex 
scenarios in natural images. In this section, we demon- 
strate this with three possible extensions. The reader 
should be aware that we do not claim that we have 
already given the best solution to each problem dis- 
cussed here. Instead, the goal is merely to show the 
readers some basic ideas about how to modify TILT. In 
fact, we believe, each of the problem deserves a much 
more thorough investigation so that more effective and 
efficient algorithms could be developed in the future. 



5.1 Matrix Completion for Boundary Effects 

We note that in Step 3 of Algorithm [ij we update the 
transformation parameters r, and recompute the trans- 
formed image / o r in Step 1 of the subsequent itera- 
tion. While this is conceptually sound, it poses a serious 
problem in practice. This is because real images always 
have finite support or size. So, if the window containing 
the texture of interest is close to the image boundary, 
then the transformed image window /or might not be 
well-defined at all pixels. The conventional methods to 
treat this problem is to either assume that the region 
outside the image has zero pixel values, or to interpo- 
late them from the boundary pixels ensuring some de- 
gree of smoothness. The former approach is ill-suited to 
our problem since it may destroy the low-rank structure 
of the texture inside the image (hence TILT may fail 
to converge to the correct solution as shown in Figure 
[lOj^a)), while the latter introduces more free parameters 
to the algorithm, namely the choice of the interpolation 
function. 

This problem can actually be handled in a more 
principled manner. We treat the pixels that fall outside 
the image boundary as missing entries of the low-rank 
matrix to be recovered. This formulation is in a similar 
spirit as the low-rank matrix completion problem that 



has been extensively studied recently ( Recht et al 



Candes and Recht 2008 Candes and Tao 2010). Let 



2008 



f? represent the set of pixels that are located inside the 
image boundary after transformation. Then, we modify 
the constraint in the linearized problem ^ as follows: 



TTnil o r + V/Z\r) = 7Tf2{I^ + E), 



(18) 



where 7rQ{-) denotes the projection operator onto the 
set of entries with support in i7. Thus, we apply the 
constraint only on the set of pixels at which the trans- 
formed image /or is well-defined. Since 7r^(-) is a linear 
operator, the resulting optimization problem is still a 
convex program and can be solved by the ALM algo- 
rithm outlined in Section 3.1^^ Figure 13 shows two 



examples of how matrix completion could improve the 
performance of TILT when the chosen window is too 
close to boundaries of the image and the correct solu- 
tion needs to converge to outside of the original image. 



5.2 Enforcing Reflective Symmetry 

Notice that "low-rank" is merely the result of many 
types of regularities and symmetries. However, a low- 
rank texture need not necessarily be symmetric. Hence, 

One can even handle small noise in this case, as shown in the 
work of [Yuan and Tao| ( [20Tq| >. 
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(a) high-rank structures 



(b) two low-rank regions 



(c) too much occlusion 



(d) random textures 







Fig. 12 Failure Cases. TILT fails to recover the geometry of these images since they deviate from the assumptions under which TILT 
is designed to work. From left to right: two incompatible dominant low-rank structures, overlapped or adjacent; too much occlusion; 
random textures. 







Fig. 13 TILT with or without Matrix Completion. Left: Basic TILT without matrix completion - TILT stops when the region 
goes over the image boundaries before it converges to the correct transform; Right: TILT with matrix completion - with the same 
initialization it converges to the correct transform. 



if we intend to recover a symmetric texture, it might 
not be sufficient to impose only the "low-rank" objec- 
tive. For instance, although many of the examples seen 
earlier have reflective symmetry, the axis of symmetry 
is not necessarily always at the center of the recov- 
ered low-rank region. So in order to ensure that the re- 
covered low-rank region has such symmetry, additional 
constraints need to be imposed on TILT. 

Suppose that 1^ G represents the image of a 

texture with reflective symmetry. Without any loss of 



generality, we may assume that the axis of symmetry is 
horizontal. Then, the reflective symmetry of can be 
expressed mathematically as 

J°(i,j) = J°(m+ 1 - V(i,i) e {1 : m} X {1 : n}. 

(19) 

In general, for any type of symmetry for we can find 
an invertible linear mapping ^ : ^ such that 
^(/^) = /QpQj Thus, we may add any desired symme- 

20 Pqj. reflective symmetry, g is its own inverse. 
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try as an additional set of constraints to the linearized 
convex program ^ in the TILT framework. Since the 
constraints from symmetry are all linear in we can 
easily use the ALM algorithm described in Section [STT] 
with minor modifications, to solve the new constrained 
optimization problem. 

We have implemented a modified version of TILT 
which enforces the recovered low-rank component to 
have refiective symmetry in both x and direct ions 
Figure 14 (top) shows the result of the modified algo- 
rithm on a checker-board with refiective symmetry in 
the X and ^/-directions enforced: Notice that the con- 
verged region is indeed symmetric in both directions. 



Figure 14 bottom shows the new converged results of 



the same stop sign example in Figure 10 with the same 
initialization. Notice that this is in fact a very challeng- 
ing case for TILT as the foreground (the sign) is very 
sparse in the image domain. The recovered low-rank 
part A is indeed very symmetric and the sparse part E 
accounts for all sparse deviations from the symmetry 
(including asymmetry in the letters). 



5.3 TILT for Rotational Symmetry 

Many other structural properties may be converted to 
a low-rank objective. For instance, the image of a ro- 
tationally symmetric pattern need not be a low-rank 
matrix, but it can be converted to one. To deal with ro- 
tational symmetry, we will consider circular windows, 
instead of rectangular ones. Each circular window is 
uniquely determined by its center and its radius. Clearly, 
the image region enclosed by such a window is not a 
matrix. However, it can be converted to one by consid- 
ering a Frieze-expansion pattern (FEP) of the region 
dLiu et aH[2QQ4[|Lee and Liu||2QlQD . 



Suppose that a matrix e R'^'''^ is the FEP of a 
circular window in an image with center at the origin 
and radius R. Then, the mapping r between an entry 
{xo^yo) in and its corresponding pixel in the image 
is given by 



^(^o,^o) 



Rxq 



(2iiyQ\ Rxo 
cos J , sm 



(^))- 



(20) 



If the center and radius of the circular window are cho- 
sen correctly, then the above FEP mapping gives rise to 
a low-rank matrix. However, in practice, the exact po- 
sition of the window is not known a priori. In addition, 
there could be an additional deformation of the pattern 

'^^ In order to allow the low-rank region to move freely to a sym- 
metric region, we have to remove the constraints on the transla- 
tion. 



due to the viewpoint. Figure p^a) shows a representa- 
tive input image. Suppose we model the deformation 



by an affine transformation. Then, the mapping (20) 



from the low-rank matrix to the input image can be 
rewritten as 



r (xo,^o) = H- 



Rxo /27r?/o 

cos ' 

m \ n 



\ n J m \ n J 



1 , 

(21) 



where H represents an affine transformation in homoge- 
nous coordinates. We can easily modify TILT to deal 
with the combined deformation of the FEP and the 
affine map and the algorithm can simultaneously re- 
cover the correct center of symmetry and the affine de- 
formation. We show the results of such an algorithm on 



one rotationally symmetric pattern in Figure 15 



6 Conclusions and Future Directions 

In this paper, we introduce a novel framework in which 
an image window is viewed as a matrix and the rank of 
the matrix is used as a measure of textural simplicity 
in the image window. We have introduced a very effec- 
tive way of extracting precise structure and geometry of 
low-rank textures from their images using iterative con- 
vex optimization techniques. The proposed algorithm 
works effectively and robustly for a wide range of regu- 
lar, symmetric patterns and structures in real images, 
suggesting that the transformed low-rank plus sparse 
structures model is important for modeling real images 
of urban environments and man-made objects. More 
importantly, the proposed tools are highly complemen- 
tary to most existing vision techniques that mainly fo- 
cus on using local features. Instead, by leveraging ef- 
ficient high-dimensional optimization techniques, the 
new tools can process a large image region to extract 
dominant structural and geometric information more 
accurately and robustly in a holistic fashion. 

The proposed TILT scheme is still quite rudimen- 
tary in its formulation and solution. Many aspects of 
it can still be improved. Also, it can be customized 
or extended by incorporating additional structural con- 
straints or by considering different deformation models. 
Conceptually, there should be little difficulty in gener- 
alizing TILT from the linear (affine or projective) trans- 
forms to other classes of possibly nonlinear domain de- 
formations. An important open problem is to derive 
conditions (on the type of signals and the deformation 
groups) under which this framework is guaranteed to 
succeed. As being low-rank is only a necessary but not 
sufficient property for many regular, symmetric pat- 
terns and structures, it is worth investigating in the 
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Fig. 14 Reflective Symmetry Imposed. Top row: results on a checker-board. From left to right: the original image /; the rectified 
image / or; the recovered low-rank component and the sparse component E. Bottom row: The corresponding results of the stop 
sign example (in Figure [lO| with refiective symmetry enforced. 




(a) Input image and initial circle (b) Frieze-expansion patterns 



Fig. 15 Rotational Symmetry with Affine Transform, (a) Input image with inherent rotational symmetry. The symmetry is 
not immediately evident due to the deformation caused by the viewpoint. The red window denotes the input and the green window 
encloses the symmetric pattern converged to by our algorithm, (b) From left to right: the FEP of the input / (red) window which 
does not exhibit any low-rank structure; the FEP of the output /or (green) window recovered by TILT; the corresponding low-rank 
texture /°; and the sparse error E in the recovered FEP. 



future more pertinent measures or objective functions 
for recovering such patterns and structures despite ge- 
ometric deformation. More generally, this work could 
motivate people to discover new types of (transform- 
invariant) properties that can be extracted effectively 
and efficiently from images in a similar holistic fashion, 
without relying on local features. 

The low-rank textures and the associated geometric 
transformations recovered by TILT can be very use- 
ful for many high-level computer vision tasks such as 
image compression, matching, segmentation, symmetry 
detection, reconstruction of 3D models of urban envi- 



ronments, and recognition of man-made objects. On the 
other side of the coin, in this paper we have not fully ad- 
dressed the issue of detecting the location and scale of 
candidate low-rank regions so as to better initialize and 
apply the TILT algorithm. As some of our experiments 
have suggested, better initialization can significantly 
improve the performance and applicability of TILT to a 
broader range of situations. This leaves plenty of room 
for future investigation on how to improve and aug- 
ment TILT with other computer vision techniques such 
as image segmentation and salient region detection, or 
with other scale-invariant local features such as SIFT. 
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S{A^ b) denote the area of the window after transfor- 
mation. Since the area of the window is unchanged by 
translation, we denote the area as S{A). Let ei and 
62 denote two adjacent edges (with the origin as the 
common vertex) of the initial square. After transfor- 
mation, these edges can be represented by the vectors 
ei = {Aii^A2i) and 62 = (^12,^22)- Then, the area of 
the transformed window is given by 



S{A) 



1 



|ei lleall sin( 



(23) 



Appendix A: Derivation of Linear Constraints 



In Section [3^ we have proposed to impose two sets of 
constraints on the deformation parameters to make the 
solution well-defined so as to avoid some pathological 
solutions. Here, we show a detailed derivation or lin- 
earization of these constraints. In particular, we present 
here the derivation for the case when the transformation 
group is the set of affine transformations. The deriva- 
tion for the homography case is very similar in case such 
constraints need to be imposed. 



Constraints on Translation ( 16 ) . Our first constraint is 



Ax + 6, where A = 



is an invertible matrix 



that the center of the rectangular window is fixed i.e., 
if xq = [xo{l) xo{2)]^ is the initial center of the window 
and r is the optimal transformation, then r{xo) = Xq. 
Since the transformation is affine, we have that r{x) = 
'All A12 

_^21 ^22 _ 

and 6 G M^. Suppose we parameterize our transforma- 
tion vector as 

'All 
A21 
A12 
A22 
b 



T = 



then in (p^ we have 



At 



xoil) xoi2) 
a;o(l) xoi2) 



(22) 



where cos = y^^y^^y . The above equation can be sim- 



plified to 



S{A) 



(^11^22-^12^21) • 



(24) 



Now suppose that the matrix A is perturbed by a small 
amount AA. Since we require that the new area S{A-\- 
AA) is close to S{A), we impose the constraint that 
the first-order term in the Taylor series expansion of 
S{A + AA) be zero i.e., 



Va S{A) ■AA = 0. 



(25) 



We now consider the second part of the constraint 
which is to minimize the rate at which the aspect ra- 
tio of the window changes. Since the aspect ratio is 
unity for the initial window, we essentially require that 
||ei|| = 1 1 62 1 1 for the transformed window, using the 
same notation as above. We define C{A) = ||ei|p — 
II 62 IP . Then, ideally, we require C{A-\- AA) to be close 
to zero. Once again, we impose the constraint that the 
first-order term in the Taylor series expansion to be zero 



Va C{A) ■AA = 0. 



(26) 



Combining ( 25 ) and ( 26 ) , and denoting r as a vector 



of all the transformation parameters, it is easy to see 
that we get a linear constraint of the form As At = 0, 



as given in (17) 
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